mon: increase cache sizes #24247

liewegas · 2018-09-24T14:22:35Z

These two changes should mitigate the luminous->mimic upgrade disaster recently experienced by a user with a ~1000 node cluster.

I think these new defaults are reasonable, but comments welcome!

Longer term, I think we need a strategy for dynamically sizing these caches based on the size of the cluster?

liewegas · 2018-09-24T16:04:02Z

retest this please

gregsfortytwo

This isn't an unreasonable amount of memory to expect a monitor to have, but right now they are often quite thin daemons (once you give them an SSD, anyway...) and the OSD map cache number is low because there were issues with monitors OOMing. So we're adding another ~500MB(+?) of memory needs to the monitor on reasonably-sized clusters, which isn't trivial.

Similarly we've been saying 1GB/TB on OSDs for a long time, but that messaging was quite confused for users of FileStore OSDs (I've certainly said on the list that I had no idea where it came from, since I think it was initially just made up without justification by some doc writer before we later decided it was a good idea?). Though I'm less worried about that change, assuming we don't backport it.

So generally I'm fine with changing these values, but we need at least some napkin math demonstrating they aren't real changes or else we need to message them pretty loudly.

liewegas · 2018-10-05T22:26:49Z

I pushed commits that updates the hardware recommendations about RAM in the docs. I also included a release note.

The hardware recommendations definitely need a refresh--probably much more than I did here. (I would prefer not to block this critical fix to our defaults with a log conversation about the hardware recs, though!)

jdurgin · 2018-10-05T22:30:11Z

doc/start/hardware-recommendations.rst

+Metadata servers (ceph-mds)
+---------------------------
+
+The manager daemon memory utilization depends on how much memory its cache is


typo - s/manager/metadata/

gregsfortytwo · 2018-10-06T05:18:18Z

Yeah sounds reasonable, I like the doc changes. Reviewed-by: Greg Farnum gfarnum@redhat.com

gregsfortytwo · 2018-10-06T05:18:38Z

@liewegas note there’s a merge conflict now though. :(

10 maps is too small to enable all mon sessions to keep abreast of the latest maps, especially if OSDs are down for any period of time during an upgrade. Note that this is quite a bit larger, but the memory usage of the mon will scale proportionally to the size of the cluster: 500 small osdmaps is not a significant amount of RAM, while conversely having a large cache is most important on a large cluster and those mons will generally have plenty of RAM available. Someday we should control this with a memory envelope like we do with the OSDs, but that remains future work. Signed-off-by: Sage Weil <sage@redhat.com>

For filestore OSDs, this is probably a good idea anyway, and is generally not going to be hugely impactful on the memory footprint (where users have been told to provide 1 GB RAM per 1 TB storage for a long time now). For bluestore OSDs, this value is meaningless as we're autotuning this anyway. For mons, this is a more reasonable default. Signed-off-by: Sage Weil <sage@redhat.com>

Signed-off-by: Sage Weil <sage@redhat.com>

* refs/pull/24247/head: PendingReleaseNotes: add note about increased mon memory footprint doc/start/hardware-recommendations: refresh recommendations for RAM rocksdb: increase default cache size to 512 MB mon: mon_osd_cache_size = 500 (from 10) Reviewed-by: Neha Ojha <nojha@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com> Reviewed-by: Greg Farnum <gfarnum@redhat.com>

liewegas added bug-fix core mon labels Sep 24, 2018

liewegas requested a review from jecluis September 24, 2018 14:22

gregsfortytwo reviewed Sep 24, 2018

View reviewed changes

markhpc added the performance label Sep 27, 2018

liewegas requested review from markhpc, jdurgin and neha-ojha October 5, 2018 22:26

jdurgin reviewed Oct 5, 2018

View reviewed changes

liewegas force-pushed the wip-36163 branch from 9a2343b to ca00705 Compare October 5, 2018 22:34

jdurgin approved these changes Oct 5, 2018

View reviewed changes

neha-ojha approved these changes Oct 5, 2018

View reviewed changes

liewegas added 4 commits October 7, 2018 17:18

doc/start/hardware-recommendations: refresh recommendations for RAM

6af6127

Signed-off-by: Sage Weil <sage@redhat.com>

PendingReleaseNotes: add note about increased mon memory footprint

2e66381

Signed-off-by: Sage Weil <sage@redhat.com>

liewegas force-pushed the wip-36163 branch from ca00705 to 2e66381 Compare October 7, 2018 22:18

liewegas added needs-qa wip-sage3-testing wip-sage-testing and removed wip-sage3-testing labels Oct 9, 2018

liewegas merged commit 2e66381 into ceph:master Oct 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mon: increase cache sizes #24247

mon: increase cache sizes #24247

liewegas commented Sep 24, 2018

liewegas commented Sep 24, 2018

gregsfortytwo left a comment

liewegas commented Oct 5, 2018

jdurgin Oct 5, 2018

gregsfortytwo commented Oct 6, 2018

gregsfortytwo commented Oct 6, 2018

mon: increase cache sizes #24247

mon: increase cache sizes #24247

Conversation

liewegas commented Sep 24, 2018

liewegas commented Sep 24, 2018

gregsfortytwo left a comment

Choose a reason for hiding this comment

liewegas commented Oct 5, 2018

jdurgin Oct 5, 2018

Choose a reason for hiding this comment

gregsfortytwo commented Oct 6, 2018

gregsfortytwo commented Oct 6, 2018